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PATENT APPLICATION 
Docket No.: 1958.2004-000 



Date: J>^\k\ . '200b Express Mail Label No. I? L SS" Q^Q i "^^O US 



Inventors: Caleb E. Welton, Albert A. Hopeman, Andrew H, Goldberg 
Attorney's Docket No.: 1958,2004-000 

MULTIDIMENSIONAL DATABASE STORAGE AND RETRIEVAL SYSTEM 

BACKGROUND 

Relational databases are known which are used to store large quantities of 
normalized data. Traditional relational databases store data in the form of relations. A 
5 primary key is used to correspond to queried values. For a given primary key, there is 
associated data arranged in a two-dimensional form. Queries are performed by defining 
query values to be searched in the primary key, and traversing the relations to find the 
desired information. 

Many apphcations, however, have a need to structure data in multiple 
10 dimenstions. In multi-dimensional databases, data values are located at points in an 
n-dimensional conceptual space defined by specific positions along one or more axes. 
For example, three axes specifying month, product and sales district would delineate a 
three-dimensional space containing sales data. Each individual dollar value for sales 
would be identified by the combination of one specific point along each axis: the sales 
15 value for a specific month, product and sales district. The n-dimensional space can be 
visualized as a data cube. 

In the relational database, this example would be modeled by constructing a 
single primary key based on the month, product, and sales districts with an additional 
column for the value being stored. The relational model, therefore, requires additional 
20 space because it is storing the primary key as part of each row, and is fiirther inefficient 
with respect to queries within a given subplane of the cube, such as accessing all rows 
where sales district is limited to a specific value. 

In a typical multi-dimensional data cube, a storage location is allocated for every 
possible combination of every dimensional value. For example, in a three-dimensional 
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data cube wherein each dimension has three possible values, 27 storage locations are 
allocated, one for each possible combination of the dimensional values. The lowest 
level detail data is typically provided by data entry and each upper level data cell is then 
computed by aggregating the detail data to fill in the data cube. To speed access time 
5 for users, the data cube can be stored in main memory. 

In many multi-dimensional data cubes, especially large ones with many different 
dimensions, there are certain combinations of dimensional values for which there are 
relatively few data values. For example, if a data cube contains sales values 
dimensioned by product, region, and time, there may not be any sales values for certain 
10 products in certain regions during at least some time periods. In large 

multi-dimensional databases having several dimensions, it is fairly common for most of 
the data cube to be empty. Indeed, certain financial data may be much less than one 
percent populated. 

Dimensions having many dimensional value combinations for which there are 
1 5 relatively few data values are referred to as "sparse." For sparse dimensions, it is 

wasteful to allocate storage space for each possible combination of dimensional values 
because many of the combinations will contain no data values. Instead, storage space 
can be allocated only for those dimensional combinations having data values. The 
problem of efficiently allocating storage space in a multi-dimensional data cube is 
20 referred to as "sparsity management." Sparsity management may include combining 
multiple dimensional attributes into a composite tuple which allows several sparse 
dimensions to be combined, thereby saving storage space. 

Further, many dimensions define data values that are combinations of other data 
values. For example, in the sales data above, the time dimension might include 
25 attributes for months, quarters, and years, in which the data values for "January," 
"February," and "March" are combined for the data value for "Quarter L" Similarly, 
the data values for the four quarters are combined, or aggregated, to compute the value 
for "Year." Such attributes that are combinations of other attributes define a hierarchy 
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of associated data values, in which the data values corresponding to one attribute 
include the values in other attributes 

Often, computing the data values that are aggregated from other associated data 
values can be time consuming. Data values used in computing the aggregate values 
5 may not be stored in the same area of the storage medium. Retrieving the data values 
from the storage medium, such as a disk, for example, may require many fetches. The 
aggregation operation which computes the aggregate values may have to iterate through 
many associated data values. Multiple and often redundant fetches may need to be 
performed to fetch values stored on the same disk page. These additional fetches 
10 increase the time and resources required to complete the aggregation operation. In a 
large multidimensional database, such increases can be substantial. 

SUMMARY 

In a multidimensional database, an aggregation operation is performed in an 
15 optimal manner by storing the values included in the aggregation operation on the same 
disk page. A sparsity manager determines aggregate values which are computed from 
other data values during the aggregation operation. Each aggregate value is associated 
with one or more data values which are used during the aggregation operation to 
compute the aggregate value. The sparsity manager stores the associated data values in 
20 proximity to each other, such as on the same disk page (or a few disk pages), so that 
multiple disk page fetches may not be required for the same set of data values during the 
aggregation operation. The data values used in the aggregation operation can therefore 
be fetched once from a common disk page, and thereafter are found in memory, such as 
on a common cache page corresponding to the common disk page. In this manner, 
25 multiple fetches for data on the same disk page during the aggregation operation are 
avoided. 

The associations between the aggregate values and the corresponding data 
values define a hierarchy having parent-child relationships. The composite tuples used 
to store the data values are arranged so that the child values corresponding to a 
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particular parent, or aggregate, value are stored adjacent to each other. The sparsity 
manager traverses the hierarchy defined by the associations, and sorts the data values 
such that, for each aggregate value, the associated child values are stored together. 

The aggregate values are computed from their associated child values. Such a 
5 computation can be performed in an offline mode, such as overnight, or computed 
interactively in realtime. Further, an aggregate value can itself include other aggregate 
values. Therefore, by storing the data values associated with an aggregate value 
together, the associated data values used to compute each aggregate value are hkely to 
be stored on the same page or sets of pages, providing optimal disk I/O. 

1 0 BRIEF DESCRIPTION OF THE DRAWINGS 

The foregoing and other objects, features and advantages will be apparent from 
the following more particular description of embodiments of the multidimensional 
database storage and retrieval system, as illustrated in the accompanying drawings in 
which like reference characters refer to the same parts throughout the different views. 
15 The drawings are not necessarily to scale, emphasis instead being placed upon 
illustrating the principles of the claimed invention. 

Fig. 1 is a schematic block diagram of an on-line analytic processing (OLAP) 
system; 

Fig. 2 is a schematic block diagram of a prior art data cube depicting a 
20 multidimensional database; 

Figs. 3a and 3b are schematic block diagrams of a prior art storage array for 
storing the multidimensional database of Fig. 2; 

Fig. 4 is a prior art storage array for storing the data of table I below; 

Fig. 5 is a sparse storage descriptor for storing the sparse array segment 
25 corresponding to the data of Fig, 4; 

Fig. 6 is a more detailed view of the OLAP system for storing aggregate data 

values; 

Figs. 7a and 7b are examples of a hierarchy of values within a dimension; 
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Figs. 8a and 8b are examples of sparse data for populating the hierarchy of Figs. 
7a and 7b; 

Fig. 9 shows a flowchart depicting the storing of data values; 
Figs. 10a, 10b and 10c show an aggregation operation using aggregate data 
5 storage corresponding to the hierarchy and data of Figs. 7a-8b; and 

Fig, 1 1 shows storage of data values on a disk storage medium, 

DETAILED DESCRIPTION 

Fig. 1 is a schematic block diagram of an on-line analytic processing (OLAP) 
system. A server 1 responds to requests from a plurahty of chent users 20 202? 

10 20^. To satisfy client requests, the server 1 retrieves data from a data storage warehouse 
30, which can include various databases^ such as relational databases 32, 
multi-dimensional databases 34 and temporary databases 36 stored on disk. In 
particular, the multi-dimensional databases 34 are embodiments of a multi-cube model, 
where each data measure, such as "sales" or "expense" or "margin", has only the 

1 5 dimensions it needs . 

The server 1 includes at least one central processing unit (CPU) 2^, I2, 2p. 
The CPUs 2 execute client or user sessions and system management processes to 
operate on data stored in memory 10, which includes an OLAP engine 12 and a cache 
memory 18. The OLAP engine 12 includes a kernel 13 and a sparsity manager 15. 

20 The user sessions and system management processes can include processing 

threads managed in a multi- threaded OLAP engine 12. That is, user sessions can 
accomplish tasks by asynchronously executing processing threads. The disclosed 
embodiments can take the form of computer-executable instructions embedded in a 
computer-readable format on a CD-ROM, floppy or hard disk, or another 

25 computer-readable distribution medium. These instructions are executed by one or 

more CPUs 1\, I2, ...^ 2 p to implement the OLAP engine 12. A particular embodiment 
of the OLAP engine is commercially available as Oracle Express Server, version 6.3.1, 
from Oracle Corporation. 
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Fig. 2 is a schematic block diagram of a prior art data cube. As illustrated, the 
data cube 34 has three axes. District, Product and Month dimensions, to store data 
representing sales over time. The Product dimension includes dimension values Pi-Px^ 
the District dimension includes dimension values Dj.Dy; and the Month dimension 
5 includes dimension values Mj.M^. A data cell is defined to be the intersection of the 
three axes dimension values in the data cube 34. The example data cube 34, therefore, 
has a volume V of data cells given by: 
(x)x(y)x(z) 

Although there are V data cells defined by the data cube, all data cells may not 

10 be populated with data. In fact, in many dimensions, the data cube may be sparsely 
populated. As an illustration, subsections 42, 44 of the data cube may be the only 
populated regions of the data cube 34. By allocating storage for the entire cube volume 
V, there may be unused storage space. It is therefore desirable to reduce the amount of 
wasted storage allocation. 

1 5 Figs. 3A-3B are schematic diagrams of a prior art storage array for storing the 

data cube of Fig. 2. As illustrated in Fig. 3 A, each combination of dimension values in 
the array 50 is allocated a storage cell. It is advantageous to represent the array 50 in 
such a way that data values fi:-equently accessed together are physically stored near each 
other. For example, if data for all months for a given Product-District pair were 

20 commonly to be used together (for example, to calculate a year-to-date total for that 
Product and District), then access would be speeded by storing all months together for 
each combination of Product and District. This establishes a default looping order for 
the dimensions of the data cube wherein Month varies fastest. Other access patterns 
might then dictate that Products vary within Districts, making District the slowest 

25 varying dimension. 

Fig. 3B is an alternative representation of the storage array 50 of Fig. 3 A, more 
clearly depicting the rate of change of the dimensions. As more clearly depicted, the 
lowest (fastest) detail level is the Month level, the middle level L2 is the Product 
level, and the upper (slowest) level L3 is the District level. 
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In a practical data cube there would likely be lower levels (e.g., day values) and 
higher levels (e.g., states, regions). As the number of levels increases, the storage 
requirements of the databases increases exponentially. Thus, sparse data at the lowest 
detail level propagates to large areas of unused storage. To reduce the amount of 
5 wasted storage, the storage structure can be compressed to reduce or eUminate sparse 
data storage or empty data cells. 

As an example, consider a company that sells ice and coal products throughout 
the country. Two districts serviced by the company are Nome, Alaska, and Miami, 
Florida. The company tracks sales by month and the database is first created in April 
1 0 using the January-March data given below in Table I, 



TABLE I 



15 



January: 


Ice 


Coal 


Nome 


0 


10 


Miami 


6 


0 








February: 






Nome 


0 


10 


Miami 


6 


0 








March: 






Nome 


0 


9 


Miami 


10 


0 









Fig. 4 is a schematic diagram of a prior art storage scheme for the data in Table 
25 L The data is stored in a linearized array 55 of data cells, indexed by values of the three 
dimensions. Because there were no sales in either month for ice in Nome and coal in 
Miami, there are unused storage cells. These six empty cells could be eliminated from 
the storage scheme without affecting the integrity of the data in the database 34. 

The sparse data is typically represented in a maimer that avoids actual allocation 
30 of storage locations for data values corresponding to unused combinations of attributes, 
or tuples. In such a multidimensional database, the sparse data is often stored in a 
composite tuple structure which provides storage space only for data values that are 
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actually populated. Creation and management of composite tuples is described in 
further detail in U.S. patent application No. 5,943,677, entitled "Sparsity Management 
for Multi-Dimensional Databases," assigned to the assignee of the present application 
and incorporated herein by reference. 
5 Fig, 5 shows a storage array for storing the sparse data of Fig. 4 in a composite 

tuple. A composite tuple combines dimensions having unused tuples into a single 
composite dimension, in this example <Nome coal> and <Miami ice>. No storage 
allocation is required for the unused tuples <Nome ice> and <Miami coal>. 

Since the composite tuples representing sparse data define storage locations only 

1 0 for data values that are actually populated, the memory allocation and indexing for 

accessing the sparse data may not conform to the data cube model described above with 
reference to Figs. 3a and 3b, which allocate a storage location for every possible tuple, 
or combination of dimensions, in the multidimensional database. Rather, a Cartesian 
compoxmd descriptor 70 is used to index into a storage array 60 to index the sparse data 

15 values 62. A plurality of storage arrays 60, each indexed through a Cartesian compound 
descriptor 70, may therefore be employed to represent the sparse data segments in a 
multidimensional database. Such sparse data segments may correspond to the 
subsections 42, 44 of populated data in the data cube model of Fig. 2. 

The Cartesian compound descriptors reference a storage array 60 stored in an 

20 arbitrary location in the memory 10 or in the cache 18 (Fig.l). The data corresponding 
to the entire data cube may not be stored in the linear, contiguous array manner depicted 
in Figs. 3a and 3b. Accordingly, an aggregation operation may not assume, for 
example, that data residing at the lowest level, LI (Fig. 4) my be aggregated by 
summing consecutive locations. Rather, the Cartesian compound descriptors 70 are 

25 used to index the data values at an arbitrary storage location in the memory 10 and 
cache 18 (Fig. 1). Many references to different Cartesian compound descriptors 70 
indexing different storage arrays 60 may be invoked in an aggregation operation 
involving sparse data. 
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Fig. 6 shows an OLAP system for storing sparse data in a manner which 
minimizes I/O retrieval during an aggregation operation. Sparse data is represented on a 
storage device 30, as storage segments Sla-Slc, such pages on a disk. Each of the 
storage segments 31 generally, corresponds to a memory page 1 la-1 Ic in the memory 
5 10 and cache 18. The cache 18 is a high-speed area of memory adapted to store storage 
segments which are statistically likely to be fetched within a short time interval. Since 
the storage device 30 contains many more storage segments than can be stored in the 
memory 10, the memory pages 11 are transferred to and from the corresponding storage 
segments 31 as they are requested. The sparsity manager 15 stores the associated, or 

10 child, values corresponding to an aggregate, or parent, value in proximity to each other 
on the storage device 30, and optimally, adjacent to other child values on the same 
storage segment 3 1 . When an aggregation operation requests an associated value, other 
associated values also needed for the corresponding aggregate value will also be 
fetched, as will be described in further detail below. 

15 In Figs. 7a and 7b, hierarchies of attributes for the time and region dimensions 

are shown, respectively. The data values corresponding to these dimensions is shown in 
Fig. 8a for the product coal and in Fig. 8b for the product ice. Referring to Fig. 7a, the 
data values for January (Jan), February (Feb), and March (Mar) are aggregated to Ql. 
The data values for April (Apr), May (May), and June (Jun) are aggregated to Q2. 

20 Similarly, the data values for Ql and Q2 are aggregated to the data value for half '00. 
In the region dimension, data values for Miami and Boston (Bos) are aggregated for 
East (E), while Nome and LA are aggregated for West (W). E and W are aggregated for 
US. In Figs. 7a and 7b, the data values at the bottom level of the hierarchy are defined 
as quantifiable entities, called detail values. The higher levels are defined as an 

25 aggregation of the lower levels, and are called aggregate values because they are 
computed from the data values of other attributes. 

In Figs. 8a and 8b the data values corresponding to the detail values of the 
hierarchies of Figs. 7a and 7b are shown. During an aggregation operation, defined 
fiirther below, the data values corresponding to each of the attributes are traversed in the 
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order defined by the hierarchy. For example, to aggregate the aggregate value 
corresponding to the attribute Ql, the detail values for the attributes Jan, Feb, and Mar 
are traversed. By storing the detail values for Jan, Feb, and Mar on the same storage 
segment, a single fetch can be performed to satisfy the aggregation of Ql. 
5 A flovv^chart depicting the aggregation process is shown in Fig. 9. Referring to 

Figs. 6 and 9, each of the sparse dimensions having a hierarchy of attributes is 
identified, as shown at step 100. An iteration for each of the sparse dimensions is 
established, as depicted at step 102. Within each sparse dimensions, the hierarchy of 
associated attributes is retrieved, as disclosed at step 104. The levels defined by the 

10 hierarchy are identified, as shown at step 106. For each level within the hierarchy, an 
attempt is made to fetch the data values associated with an aggregate value, as disclosed 
at step 108. First, a cache index is performed to attempt to retrieve the values fi-om the 
cache 18, as shown at step 110, If there is a cache hit, as determined at step 1 1 1; the 
data values are retrieved firom the cache, as shown at step 1 12. If the data values are not 

1 5 found in the cache, a fetch is performed to retrieve the corresponding storage segments 
31 fi-om the storage device 30, as disclosed at step 1 14. The associated data values are 
aggregated to determine the aggregate value, as described at step 116. If there are more 
data values on the current level associated with aggregate values on the next level, 
execution continues at step 108, as depicted at step 118. The data values associated 

20 with the next aggregate value are likely to have been fetched previously, and will 
therefore tend be found in the cache 18. 

If the levels of the current dimension have not been fully aggregated, as 
determined at step 120, execution continues with the next level of the hierarchy, as 
shown at step 122. When all levels of the current hierarchy of the current dimension 

25 have been aggregated, as determined at step 120, execution continues with the next 
dimension, as depicted at step 124. In this manner, all the data values in the database 
are aggregated while minimizing disk I/O. 

The aggregation operation defined in Fig. 9 optimizes disk I/O because the data 
values used in computing the aggregate values are stored adjacently to each other, as 
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will be described further below. Therefore, a retrieval operation that results in a fetch of 
a storage segment 31 from the storage device 30 is likely to fetch other data values 
associated with the same aggregate value. Accordingly, successive retrievals of data 
values are likely to be found in the cache. 
5 In one embodiment, all the data values associated with an aggregate value or 

level of aggregate values are stored on the same storage element in the storage medium. 
In such an embodiment, a single fetch is performed to retrieve all the data values 
required for aggregation of a particular aggregate value or level of aggregate values. 
Many factors affect the storage locations of data values on the storage medium. Factors 

10 such as number of associated data values corresponding to an aggregate value, the 
number of aggregate values on a level, and the physical size of the storage element, as 
well as others, all affect the data values that are retrieved in a single fetch operation. 
Further, there is rarely a direct mapping from data values associated with an aggregate 
value or a hierarchy level to a storage element. Often the data values associated with an 

1 5 aggregate value or a hierarchy level may span several storage elements, or an individual 
storage element may contain the data values associated with several aggregate values. 
The number of storage elements reflected in the cache also affects whether a particular 
data value will be found in the cache. Accordingly, it is an objective to store the data 
values on the storage medium in an order such that the likelihood of retrieving a data 

20 value from the cache are optimized, since many of the data values returned from a fetch 
will likely be used in an aggregation before the cache entry in which they are stored 
expires from the cache. 

As indicated above, the ordering of the data values on the storage elements in 
the storage device determines the likelihood that a single fetch will satisfy subsequent 

25 data value retrieval attempts. Referring to Figs. 10a and 10b, ordering of the data values 
on the storage device is described with respect to an aggregation operation for sales 
volume V() on the time and region dimensions of Figs. 7a and 7b, respectively, for the 
product coal as indicated in Fig. 8a. 
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Referring to Fig. 10a, an aggregation along the time dimension is shown. The 
data value column 40 indicates the data values from the hierarchy of Fig. Va, ordered as 
sales volume V(region, time). The bottom level 46 therefore stores the detail values for 
V(BoSj Jan..Jun) and V(Nome5 Jan..Jun). The bottom level 46 is retrieved from the 
5 storage location position indicated in column 44, to yield the detail values in column 42, 
corresponding to the detail values of Fig. 8a. 

The middle level 48 aggregation involves computing the middle level aggregate 
values in column 42 from the associated data values on the bottom level 46 detail 
values, and storing in the storage location indicated column 44. The middle level 48 
10 sales volume, therefore, ranges from V(Bos, Q1..Q2) and V(Nome, Q1..Q2) as indicated 
by the hierarchy of Fig. 7a. 

The top level 50 aggregation involves computing the aggregate value V(Bos, 1'^ 
half '00) and V(Nome, 1'^ half '00), as indicated in column 40. The resultant aggregate 
values in column 42 are computed from the associated data values from the middle level 
15 48 aggregate values in column 42, and stored in storage location indicated in column 44. 

After the aggregation operation has been completed for the time dimension, the 
aggregation operation for the region dimension commences, as shown in Fig. 10b. The 
aggregation operation for the region dimension determines associated data values 
according to the hierarchy of Fig. 7b. The volume V() range therefore includes 
20 V(Bos..LA, Jan..Jun). In this instance, the detail values for V(Bos, Jan.Jun) and 

V(Nome, Jan..Jun) are the same detail values used in the time aggregation of Fig. 10a, 
and V(Miami, Jan..Jun) and V(LA, Jan.,Jun) are sparse, and therefore not applicable, 
and treated as zero, in the example given. Accordingly, the values for V(E, Jan..Jun) 
and V(W, Jan..Jun) in column 40 are computed as shown in colixmn 42, corresponding 
25 to the middle level of the region hierarchy 52 of Fig. 7b. The aggregate values 

associated with V(E, Jan..Jun) and V(W, Jan..Jun) are stored at the storage locations 
indicated in column 44. 

Refer to Fig. 10c, the aggregation for the time dimension are computed for the 
second level of the region dimension 53 V(E..W, Q1..Q2) and V(E..W, V half), and 
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stored in the storage locations indicated in column 44. The top level 54 aggregation, 
V(US, Jan..Jun), aggregates the values associated with V(E, Jan. Jun) and V(W, 
Jan..Jun) as shown in column 42, and stores the resultant aggregate values at the storage 
locations indicated in col. 44 for the top level 54. Finally, V(US, Q1..Q2) and V(US, V 
5 half) complete the aggregation operation for the remaining aggregate values indicated 
56. The example given here is illustrative. In a larger multidimensional database, 
further aggregations could be performed with additional dimensions and attributes 
defined in the hierarchy. 

Fig. 1 1 shows one embodiment employing a disk as the storage medium 

10 comprising disk pages as the storage elements. In accordance with the storage location 
position for the time dimension of Fig. 7a, described above with respect to Figs. 10a, 
each disk page stores four data values. As indicated eariier, there need not be an exact 
mapping from aggregate values or hierarchy levels to disk pages. Disk page size, data 
value size, and cache size may vary from system to system. In accordance with the 

1 5 system as defined by the present claims, the ordering of the data values on the disk is 
such that a page fetch is likely to result in fetching other, associated data values that will 
be used in computing the same aggregate value. 

Referring to Fig. 1 1, the disk pages storing the data values of corresponding to 
column 44 of Fig. 10a is shown. The disk page 31a contains storage locations 1-4 

20 corresponding to V(Bos, Jan.. Apr). Disk page 31b contains storage locations 5-8 for 
V(Bos, May..Jun) and V(Nome, Jan..Feb). Disk page 31c contains storage locations 9- 
12 for V(Nome, Mar..Jun). Disk page 3 Id contains storage locations 13-16 for 
aggregate values V(Bos..Nome, Q1,.Q2). Finally, disk page 31e contains storage 
locations 17-18 for aggregate values V(Bos..Nome, 1^^ half '00), with two remaining 

25 storage locations. In this manner, only 5 page fetches from the disk are required for an 
aggregation operation along the time dimension of Fig. 7a. 

Those skilled in the art should readily appreciate that the programs defining the 
operations and methods defined herein are deliverable to a computer in many forms, 
including but not limited to a) information permanently stored on non-writeable storage 
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media such as ROM devices, b) information alterable stored on writeable storage media 
such as floppy disks, magnetic tapes, CDs, RAM devices, and other magnetic and 
optical media, or c) information conveyed to a computer through communication media, 
for example using baseband signaling or broadband signaling techniques, as in an 
5 electronic networks such as the Intemet or telephone modem lines. The operations and 
methods may be implemented in a software executable out of a memory by a processor 
or as a set of instructions embedded in a carrier wave. Alternatively, the operations and 
methods may be embodied in whole or in part using hardware components, such as 
Application Specific Integrated Circuits (ASICs), state machines, controllers or other 
10 hardware components or devices, or a combination of hardware and software 
components. 

While the system and method for storing data values in a multidimensional 
database has been particularly shown and described with references to embodiments 
thereof, it will be understood by those skilled in the art that various changes in form and 
1 5 details may be made therein without departing from the scope of the invention 
encompassed by the appended claims. Accordingly, the present invention is not 
intended to be limited except by the following claims. 
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CLAIMS 

What is claimed is: 

1 . A method of storing data values in a multidimensional database comprising: 

identifying a plurality of dimensions, wherein each of the dimensions is 
5 indicative of a plurality of storage locations; 

identifying a hierarchy of attributes within at least one of the dimensions, 
wherein the hierarchy is indicative of an association between the attributes; 
attributing a plurality of data values to each of the attributes; and 
storing the data values on a storage medium based on the data values 
1 0 indicated by the hierarchy. 

2. The method of Claim 1 wherein the data values are stored on the storage 
medium in proximity to associated data values, wherein the associated data 
values are attributed to associated attributes as indicated by the hierarchy. 

3. The method of Claim 2 wherein the data values further comprise aggregate 
1 5 values and detail values. 

4. The method of Claim 3 wherein each of the aggregate values includes at least 
one of other data values. 

5. The method of Claim 3 wherein the aggregate values comprise at least one detail 
value. 



20 6. 



The method of Claim 3 wherein the aggregate values further comprise at least 
one data value selected from the group consisting of aggregate values and detail 
values. 
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7, The method of Claim 3 wherein the association is a parent-child association 
between an aggregate value and at least one child data value. 

8. The method of Claim 3 wherein the association is between an aggregate value 
and at least one data value. 

5 9. The method of Claim 3 wherein the hierarchy defines a plurality of levels, 

wherein each of the aggregate values on one level includes the data values on a 
successive level. 

10. The method of Claim 9 wherein the association is an inclusion of the plurality of 
1 0 data values in the aggregate value. 

1 1 . The method of Claim 3 wherein each of the data values associated with an 
aggregate value are stored proximate to the other data values associated with the 
same aggregate value as indicated by the hierarchy. 

12. The method of Claim 1 1 wherein the proximate values are stored adjacently. 

15 13. The method of Claim 3 wherein storing further comprises storing the aggregate 
value on the storage medium adjacent to the associated data values. 

14. The method of Claim 13 wherein the storage medium further comprises storage 
segments, wherein the data values in a storage segment are manipulated 
concurrently. 

20 15. The method of Claim 14 wherein the storage medium is a disk cache and the 
storage segments are cache pages corresponding to pages on a disk. 
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1 6. The method of Claim 1 5 wherein storing the proximate data values further 
comprises storing the data values on a common cache page. 

17. The method of Claim 14 wherein the storage medium is a disk and the storage 
segments are disk pages. 

5 18. The method of Claim 17 wherein the storing the proximate data values further 
comprises storing on a common disk page. 

1 9. The method of Claim 3 further comprising the steps of: 

aggregating at least one of the dimensions having a hierarchy by 
traversing each of the aggregate values included in the dimension; and 
10 including, in an aggregation totals the associated data values 

corresponding to the aggregate value. 

20. The method of Claim 19 wherein aggregating includes: 

traversing, for a first aggregate value on a first level, each of the data 
values on a second level associated with the aggregate value; and 
15 subsequently traversing each of the other aggregate values on the first 

level via traversing the data values on a second level associated with the 
subsequent aggregate value. 

21. The method of Claim 20 wherein traversing further includes 

fetching, from the storage medium, a storage segment having data values 
20 corresponding to the aggregate value, wherein each of the corresponding data 

values are stored in an adjacency. 



22. 



The method of Claim 21 wherein the storage segment is a memory cache page. 
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23. The method of Claim 21 wherein the storage segment is a disk page. 

24. A system for storing and accessing a multidimensional database comprising: 

a memory having a cache and an database engine; 

a mass storage device in communication with the memory and operable 
5 to store a plurality of data values; 

a kernel included in the database engine, wherein the kernel is operable 
to manipulate data values between the memory, the cache, and the mass storage 
device; and 

a sparsity manager included in the database engine, wherein the sparsity 
10 manager is operable to determine a storage organization of the data values from 

a predetermined hierarchy. 

25. The system of Claim 24, wherein the data values ftirther comprise aggregate 
values and detail values as defined by the predetermined hierarchy. 

26. The system of Claim 25 wherein the aggregate values are further comprised of 
1 5 aggregate values and detail values. 

27. The system of Claim 26 wherein the aggregate values are further comprised of 
other data values. 

28. The system of Claim 25 wherein the hierarchy further defines levels, wherein the 
aggregate values on a first level comprise data values on a second level. 

20 29. The system of Claim 25 wherein each of the aggregate values is associated with 
at least one of other data values as indicated by the hierarchy. 
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30. The system of Claim 28 wherein each of the data values associated with a 
common aggregate value are stored in proximity. 

3 1 . The system of Claim 30 wherein the proximity is an adjacency. 

32. The system of Claim 30 wherein the mass storage device comprises storage 

5 segments, wherein the storage segments are indicative of a set of data values that 

are manipulated together. 

33. The system of Claim 32 wherein each of the data values stored in proximity are 
stored on a common storage segment. 

34. The system of Claim 32 wherein the storage segments comprise disk pages. 

10 35. The system of Claim 32 wherein the storage segments correspond to cache 
pages. 

36. A computer program product having computer program code for storing data 
values in a multidimensional database comprising: 

computer program code for identifying a plurality of dimensions, 
15 wherein each of the dimensions is indicative of a plurality of storage locations; 

computer program code for identifying a hierarchy of attributes within at 
least one of the dimensions, wherein the hierarchy is indicative of an association 
between the attributes; 

computer program code for attributing a plurality of data values to each 
20 of the attributes; and 

computer program code for storing the data values on a storage medium 
in proximity to associated data values, wherein the associated data values are 
attributed to associated attributes as indicated by the hierarchy. 
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A computer data signal having program code for storing data values in a 
multidimensional database comprising: 

program code for identifying a plurality of dimensions, wherein each of 
the dimensions is indicative of a pluraHty of storage locations; 

program code for identifying a hierarchy of attributes within at least one 
of the dimensions, wherein the hierarchy is indicative of an association between 
the attributes; 

program code for attributing a plurahty of data values to each of the 
attributes; and 

program code for storing the data values on a storage medium in 
proximity to associated data values, wherein the associated data values are 
attributed to associated attributes as indicated by the hierarchy. 

A system for storing data values in a multidimensional database comprising: 
means for identifying a pluraHty of dimensions, wherein each of the 

dimensions is indicative of a plurahty of storage locations; 

means for identifying a hierarchy of attributes within at least one of the 

dimensions, wherein the hierarchy is indicative of an association between the 

attributes; 

means for attributing a plurahty of data values to each of the attributes; 

and 

means for storing the data values on a storage mediiun in proximity to 
associated data values, wherein the associated data values are attributed to 
associated attributes as indicated by the hierarchy. 

A method of storing data values in a multidimensional database comprising: 

identifying a plurality of dimensions, wherein each of the dimensions is 
indicative of a plurality of attributes associated with a data value; 
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identifying a hierarchy within at least one of the dimensions, wherein the 
hierarchy is indicative of an association between the plurality of attributes; 

assigning a plurality of data values to each of the plurahty of attributes; 

storing the data values on a storage medium in proximity to associated 
data values, wherein the associated data values are assigned to associated 
attributes as indicated by the hierarchy. 

A method of storing data values in a multidimensional database comprising: 

identifying a plurality of dimensions, wherein each of the dimensions is 

indicative of a plurality of storage locations; 

identifying a hierarchy of attributes within at least one of the dimensions, 

wherein the hierarchy is indicative of an association between the attributes; 

attributing a plurality of data values to each of the attributes; and storing 

the data values on a storage medium on the same disk page as associated data 

values, wherein the associated data values are attributed to associated attributes 

as indicated by the hierarchy. 
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MULTIDIMENSIONAL DATABASE STORAGE AND RETRIEVAL SYSTEM 

ABSTRACT OF THE DISCLOSURE 

In a multidimensional database, an aggregation operation is performed in an 
optimal manner by storing the values included in the aggregation operation on the same 
5 disk page. A sparsity manager determines aggregate values that are computed from 
other data values during the aggregation operation. Each aggregate value is associated 
with one or more data values that are used during the aggregation operation to compute 
the aggregate value. The sparsity manager stores the associated data values in 
proximity to each other, such as on the same disk page, so that multiple disk page 
10 fetches may not be required for the same set of data values during the aggregation 
operation. The data values used in the aggregation operation can therefore be fetched 
once from a disk page, and thereafter are found in memory, such as on a cache page 
corresponding to the disk page. In this manner, multiple fetches for the same disk page 
during the aggregation operation are avoided. 



CD 
> 



f°i 

0) 




oCOl 
O 



:rO- 



CD 









Engine 
12 



.t: CD 
CO CP 

2 

GO <^ 



OJ 



i 






J 






i 








OJ 
















CM 






cx 












r 


CO 


1 








CvJ 




( 






) 






e • V 


ZD 


a 






CL 




a. 


u 




O 




o 



c 
O 



O 
CM 



OJ 

c 

O 



IJL 











r 


O 




o 





Q 








Q 




CM 












CL 






O 


CL 


CM 


Q 


qT 






OJ 

Q 


X 

CL 


NJ 




Q 




OJ 


CM 
O 








CM 


a. 






CM 

Q 


a. 


CM 


CM 

Q 


qT 




cS 








Q 


a? 


CM 


c5" 








Q 








£5" 


CM 


CM 








c5" 


ClT 






o 


a, 


CM 


c5" 


CL 


2" 



< 



o 
in 

















CM 








Q 








Nl 




CL 


CVJ 


























CM 








CM 

Q 








Nl 










CL 


CM 
















■ 








CM 






2" 




CM 



CD 

CD 
Ll 



in 

ID 



\ 



t 
• 








O 


O 




Coa 


Feb 


o 


jUJD! 




Jan 


o 






Mar 






Ice 


Feb 








Jan 








Mar 






Coal 


Feb 




me 




Jan 


o 


o 




o 


o 






Feb 


o 






o 


o 


/ 


f 






rO 


CM 







o 

II 

LU It 
CO ^ 

ii: 

O ^ 



O 



o 
o 
O 
a> 
E 
o 



if 

E 
o 



5: 
o 



CO t/> 



o 

CD 



/ 



o 




CM 


n 


If 










c 


JQ 


w 


o 




a 






















to 




o 


o 




"4-. 






»— f 








I'* half '00 ^ 

Ql Q2 

Jan Feb Mar Apr May Jun 
Fig. 7a 




Miami Bos Nome LA 



Fig, 7b 



Coal 





Jan 


Feb 


Mar 


Apr 


May 


Jun 


Miami 


< 


SPARSE 






> 


LA 


< 


SPARSE 






> 


Nome 


7 


7 


6 


4 


2 


0 


Bos 


10 


10 


9 


7 


3 


0 




Fig. 8a 












Ice 
















Jan 


Feb 


Mar 


Apr 


May 


Jun 


Miami 


6 


6 


7 


8 


8 


9 


LA 


3 


4 


4 


5 


5 


7 


Nome 


< 


SPARSE 






> 


Bos 


< 


SPARSE 






> 



Fig. 8b 



Identify sparse dimensions 



100 



Establish iteration for each of 
the sparse dimensions 




Fetch corresponding storage 
segments from storage 
device 



Fig. 9 



Sales 


Volume 


Position 


V (Bos, Jan) 


=7 




V (Bos, Feb) 


-7 




V (Bos, Mar) 


-6 


(3) 


V (Bos, Apr) 


=4 


(4) 


V (Bos, May) 


=2 


(5) 


V (Bos, Jun) 


=0 


/e:\ 

(o) 


V (Nome, Jan) 


=10 


in 


V (Nome, Feb) 


=10 


(8) 


V (Nome, Mar) 


=9 


(9) 


V (Nome, Apr) 


-7 


(10) 


V (Nome, May) 






V (Nome, Jim) 


=0 


(12) 


V (Bos, Ql) 


V (Bosjan) + V (Bos,Feb) + (Bos,Mar) 

-7 +7 +6 =20 


(13) 


V (Bos, Q2) 


V (Bos, Apr) + V (Bos, May) + V (Bos, Jun) 

-4 -f-2 +0 ==6 


(14) 


V (Nome, Ql) 


V (Nome, Jan) + V (Nome,Feb) + (Nome,Mar) 

= 10 +10 +9 =29 


(15) 


V (Nome, Q2) 


V (Nome,Apr) + V (Nome,May) + V (Nome, Jun) 
= 7 +3 +0 =10 


(16) 


V(Bos, half '00) 


V(Bos,Ql) +V(Bos,Q2) 

-20 +6 =26 


(17) 


V(Nome, P^half) 


V (Nome, Ql) + V (Nome, Q2) 

==29 +10 =39 


(18) 



Fig. 10a 



HO 






9^ 


Sales 


Volume 


Position 


V (E, Jan) 


V (Bos, Jan) + V (Miami, Jan) 
=7 +0 


= 7 


(19) 


V (E, Feb) 


V (Bos, Feb) + V (Miami, Feb) 
-7 +0 


= 7 


(20) 


V (E, Mar) 


V (Bos, Mar) + V (Miami, Mar) 
=6 +0 


-6 


(21) 


V(E,Apr) 


V (Bos, Apr) + V (Miami, Apr) 
-4 +0 


-4 


(22) 


V (E, May) 


V (Bos, May) + V (Miami, May) 
-2 4-0 


= 2 


(23) 


V (E, Jun) 


V (Bos, Jun) + V (Miami, Jun) 
-0 +0 


-0 


(24) 


V (W, Jan) 


V (Nome, Jan) + V (LA, Jan) 
=10 +0 


= 10 


(25) 


V (W, Feb) 


V (Nome, Feb) + V (LA, Feb) 
-10 . +0 


= 10 


(26) 


V (W, Mar) 


V (Nome, Mar) + V (LA, Mar) 
-9 +0 


= 9 


(27) 


V (W, Apr) 


V (Nome, Apr) + V (LA, Apr) 
-7 +0 


= 7 


(28) 


V (W, May) 


V (Nome, May) + V (LA, May) 
=3 +0 


-3 


(29) 


V (W, Jun) 


V (Nome, Jun) + V (LA, Jun) 
-0 +0 


= 0 


(30) 



Sales 


Volume 


POBition 


V(E. Ql) 


= V(Bos,Ql) + V(M3ami,Ql) 
= 20 + NA = 20 


(31) 


V(E, Q2) 


= V(Bos,Q2) + V{Miami,Q2) 
= 6 + NA = 6 


(32) 


V(W, Ql) 


= V(Nome,Ql) + V{LA,Q1) 
= 29 + NA = 29 


(33) 


V(W, Q2) 


= V{Nonie,Q2) + V(LA,Q2) 
« 10 + NA = 10 


(34) 


V(E, half) 


= V{Bos,l'* haJf) + V(Miaxiu,l'* half) 
= 26 + NA = 26 


(35) 


V(W, 1^* half) 


= V(Nome,Ql»< half) + V{LA,1*« half) 
= 39 + NA = 39 


(36) 


V(US, JajD) 


= V(E,Jaa) + V(W,Jaji} 
= 7 + 10 = 17 


(37) 


V(US, Feb) 


= V(E,Feb) + V(W,Feb) 
= 7 + 10 = 17 


(38) 


V(US, Mar) 


= V(E,Mar) + V(W,Mar) 
= 6 + 9 = 15 


(39) 


V(US, Apr) 


= V{E,Apr) + V(W,Apr) 
= 4 + r = 11 


(40) 


V(US, May) 


= V(E,May) + V(W,May) 
=2+3=5 


(41) 


V{US, Jun) 


= V{E,Jun) + V(W,Jun) 
=0+0=0 


(42) 


V(US, Ql) 


= V(E,Q1) + V(W,Q1) 
= 20 + 29 = 49 


(43) 


V(US, Q2) 


= VCE,Q2) + V(W,Q2) 
= 6 + 10 = 16 


(44) 


V(US, half) 


= V(E,1" half) + V(W,1" half) 
= 26 + 39 = 65 


(45) 



V (Bos, Jan) 


=7 


(1) 


V (Bos, Feb) 


-7 


(2) 


V (Bos, Mar) 


-6 


(3) 


V (Bos, Apr) 


=4 


(4) 




V (Bos, May) 


-2 


(5) 


V (Bos, Jun) 


-0 


(6) 


V (Nome, Jan) 


=10 


(7) 


V (Nome, Feb) 


-10 


(8) 




V (Nome, Mar) 


-9 


(9) 


V (Nome, Apr) 


=7 


(10) 


V (Nome, May) 


-3 


(11) 


V (Nome, Jun) 


=0 


(12) 




V(Bos, Ql) 


=20 


(13) 


V (Bos, Q2) 


-6 


(14) 


V(Nome, Ql) 


=29 


(15) 


V (Nome, Q2) 


-10 


(16) 




V(Bos, half '00) 


-26 


(17) 


V(Nome, half =00) 


=39 


(18) 




DOCKET NO. 1958.2004-000 



IN THE UNITED STATES PATENT AND TRADEMARK OFFICE 

Declaration for Patent Application 
As a named inventor, I hereby declare that: 

My residence, post office address and citizenship are as stated next to my name; 

I believe I am the original, first and sole inventor (if only one name is listed) or an original, first and joint 
inventor (if plural names are hsted in the signatory page(s) commencing at page 3 hereof) of the subject matter 
which is claimed and for which a patent is sought on &e invention entitled 

MULTIDIMENSIONAL DATABASE STORAGE AND RETRIEVAL SYSTEM 



the specification of which (check one) 
[X ] is attached hereto. 



[] 



was filed on 



. as United States Application 



Number or PCT International Application No. 

and was amended on (if applicable), 

I hereby state that I have reviewed and understand the contents of the above-identified specification, 
including the claims, as amended by any amendment referred to above. 

I acknowledge the duty to disclose information which is known by me to be material to patentability as 
defined in 37 C.RR. §1.56. 

I hereby claim foreign priority benefits under 35 U.S.C. 1 19(a)-(d) or 365(b) of any foreign application(s) 
for patent or inventor's certificate or 365(a) of any PCT international application which designated at least one 
country other than the United States of America, listed below and have also identified below, by checking the box, 
any foreign application for patent or inventor's certificate, or of any PCT international application having a filing 
date before that of the application on which priority is claimed: 



Prior Foreign Application(s) 



Priority 

Not 
Claimed 



Certified 
Copy Filed? 
YES NO 



(Number) 


(Country) 


(Day/MonthA^ear filed) 


(Number) 


(Country) 


(Day/MonthA^ear filed) 



] 



] 



[ ] 



[ ] 
[ ] 
[ ] 



[ ] 
[ ] 
[ ] 



(Number) (Country) (Day/MonthA'ear filed) 

Thereby claim the benefit under 35 U.S.C. § 119(e) of any United States provisional application(s) listed below. 



(Application Number) 



(Filing Date) 



(Application Number) 



(FiUng Date) 



-2- 



I hereby claim the benefit under 35 U.S.C. 120 of any United States appUcation(s), or 365(c) of any PCX 
international application designating the United States of America, listed below and, insofar as the subject matter of 
each of the claims of this appHcation is not disclosed in the prior United States or PCT International application in 
the manner provided by the first paragraph of 35 U.S.C. 1 12, 1 acknowledge the duty to disclose information known 
by me to be material to patentability as defmed in 37 C.F.R. 1,56 which became available between the filing date of 
the prior application and the national or PCT international filing date of this application: 



(AppHcation Serial No.) 


(Filing date) 


(Status: patented, pending, abandoned) 


(AppHcation Serial No,) 


(Filing date) 


(Status: patented, pending, abandoned) 


(Application Serial No.) 


(Filing date) 


(Status: patented, pending, abandoned) 



(AppHcation Serial No.) (Filing date) (Status: patented, pending, abandoned) 



As a named inventor, I hereby appoint the attorneys and/or agents associated with 
Hamilton, Brook, Smith & Reynolds, P,C., Two Militia Drive, Lexington, Massachusetts 02421-4799 
Customer No. 21005, 

and 

to prosecute this application and to transact all business in the Patent and Trademark Office connected therewith. 

Please send correspondence to: 

[ X ] Customer No. 21005 

HAMILTON, BROOK, SMITH & REYNOLDS, P.C. 
Two Militia Drive 
Lexington, MA 02421-4799 

or 

[ ] Address as follows; 



Direct telephone calls to: Rodney D. Johnson, Esq. Telephone No.: 781-861-6240 

Direct facsimiles to: Rodney D. Johnson, Esq. Facsimile No.: 781-861-9540 



I hereby declare that all statements made herein of my own knowledge are true and that all statements 
made on mformation and belief are believed to be true; and further that these statements were made with the 
knowledge that willful false statements and the like so made are punishable by fme or imprisonment, or both, under 
Section 1001 of Title 18 of the United States Code and that such willful false statements may jeopardize the validity 
of the application or any patent issued thereon. 



Full name of sole 

or first inventor Caleb E. Welton 

Inventor's Signature Date 7/*j/ ^op^ 

Residence 6 Prichard Avenue i>Q QocLa 

Somerville. MA 02144 MM^n J^^ O'XIH^ ^^ '^ 

— f — 

Citizenship United States of America 



Post Office Address 



Same as above 



Full name of second joint 

inventor, if any Albert A Hopeni^ , ^ } 

Inventor's Signature fAAA Date -^tU^ /'^^^C^C a^-) 

Residence 1 Monadnock Road 

Arlington, MA 02476 

Citizenship United States of America 

Post Office Address Same as above 



Full name of third joint 

inventor, if any 

Inventor's Signature 

Residence ^ 



Andiew H. Goldberg 



Date 



23 Stafford Road 



Newton Centre, MA 02459 



Citizenship _ 



Post Office Address 



United States of America 



Same as above 



: tODMAVMHODM AMM ANAGE; 147145;! 



